0.1 Introduction

In this Rmarkdown we are going to explore Visium FFPE data.

0.2 FFPE Visium vs Fresh frozen FFPE

Check full description here. Visium FFPE has a different concept than their fresh-frozen counterpart. The fresh frozen permeabilizes the tissue, the mRNA hybridize via the poly-A tail with the oligo attached to the bead, then cDNA is synthesized and ultimately sequenced. Since the mRNA in FFPE tissue slices is of lower quality, shorter strands, we need to follow a different approach. FFPE Visium used whole transcriptome DNA probes that hybridizes to the mRNA molecules and is then ligated. These ligation products are released from the tissue and bind with the spatially barcoded oligonucelotides. Ultimately in this for Visium FFPE we sequence the synthetic probe DNA instead of the cDNA.
IMPORTANT, be ware that genes not targete by the probe set will NOT be detected.

A more in depth explanation can be found here.

Briefly, the whole-transcriptome probe panels contain paired probes for each targeted gene sequence. Both probes need to hybridize to the target gene and ligate to each other in order to be counted as a UMI in the final coun matrix, (by default non-ligated probes are removed since they can capturing off-target noise).

What implications does this have?
Since we are sequencing the DNA probes and not the cDNA we won’t be able to detect genetic variants, mRNAs not targeted by the synthetic DNA probes.

How do we get the count matrix? More info here and here.

In this case we need to use the latest version of spaceranger, v1.3.0. We will use the same command spaceranger count but there are 2 main differences:
1- Spaceranger in this case uses a short-read aligner aimed at maximizing the performance when mapping probe sequences.
2.1- We need to pass a probe set to spaceranger. This probe set contains a whole transcriptome reference file declaring the gene panel used for the experiment. This contains detailed information on the probes used and which genes they target. More info here.
2.2- Furhtermore, the probes can have some off-target effect with homologous genes or similar sequences, these probes are by default filtered from the analysis. These can be maintained by adding the flag --no-probe-filter when running spaceranger.

0.3 Libraries

library(Seurat)
library(ggpubr)
library(cowplot)
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(glue)
library(stringr)
library(readr)

0.4 Load data

Data for this tissue slice can be downloaded directly from 10X public datasets here.

se_filtered <- Seurat::Load10X_Spatial(
  data.dir = here::here("data/"),
  filename = "Visium_FFPE_Human_Breast_Cancer_filtered_feature_bc_matrix.h5",
  slice = "ffpe",
  filter.matrix = TRUE)


se_unfiltered <- Seurat::Load10X_Spatial(
  data.dir = here::here("data/"),
  filename = "Visium_FFPE_Human_Breast_Cancer_raw_feature_bc_matrix.h5",
  slice = "ffpe",
  filter.matrix = FALSE)

0.5 Exploration

0.5.1 Visual inspection

p1 <- Seurat::SpatialPlot(
  object = se_unfiltered,
  features = c("nCount_Spatial", "nFeature_Spatial"),
  image.alpha = 0)

p2 <- Seurat::SpatialPlot(
    object = se_unfiltered,
    features = "nCount_Spatial",
    image.alpha = 1,
    alpha = c(0, 0)) + Seurat::NoLegend()

p1 + p2

Add mitochondrial and ribosomal %

# Collect all genes coded on the mitochondrial genome
se_unfiltered[["percent.mito"]] <- Seurat::PercentageFeatureSet(
  object = se_unfiltered,
  pattern = "^MT-")
summary(se_unfiltered[["percent.mito"]])
##   percent.mito
##  Min.   :0    
##  1st Qu.:0    
##  Median :0    
##  Mean   :0    
##  3rd Qu.:0    
##  Max.   :0
# Collect all genes coding for ribosomal proteins
se_unfiltered[["percent.ribo"]] <- Seurat::PercentageFeatureSet(
  object = se_unfiltered,
  pattern = "^RPL|^RPS")
summary(se_unfiltered[["percent.ribo"]])
##   percent.ribo     
##  Min.   :0.000000  
##  1st Qu.:0.000000  
##  Median :0.000000  
##  Mean   :0.008530  
##  3rd Qu.:0.009708  
##  Max.   :1.010101

We can see how the DNA probes ignore mitochondrial and ribosomal genes are widely excluded from the analysis since they are most likely not targeted by the probe set. This is definitely something to keep in mind when analyzing this type of data.

Seurat::SpatialPlot(
  object = se_unfiltered,
  features = c("percent.mito", "percent.ribo"),
  image.alpha = 0)

0.5.2 Count matrix inspection

Next we also want to take a look at the count matrix since the method used to obtain the UMI counts differs from scRNAseq and Fresh-Frozen Visium.

se_unfiltered@assays$Spatial@counts[1:15, 1:15]
## 15 x 15 sparse Matrix of class "dgCMatrix"
##                                          
## MIR1302-2HG . . . . . . . . . . . . . . .
## FAM138A     . . . . . . . . . . . . . . .
## OR4F5       . . . . . . . . . . . . . . .
## AL627309.1  . . . . . . . . . . . . . . .
## AL627309.3  . . . . . . . . . . . . . . .
## AL627309.2  . . . . . . . . . . . . . . .
## AL627309.5  . . . . . . . . . . . . . . .
## AL627309.4  . . . . . . . . . . . . . . .
## AP006222.2  . . . . . . . . . . . . . . .
## AL732372.1  . . . . . . . . . . . . . . .
## OR4F29      . . . . . . . . . . . . . . .
## AC114498.1  . . . . . . . . . . . . . . .
## OR4F16      . . . . . . . . . . . . . . .
## AL669831.2  . . . . . . . . . . . . . . .
## LINC01409   . . . . . . . . . . . . . . .
# How many genes do we detect
dim(se_unfiltered@assays$Spatial@counts)
## [1] 36945  4992
# Lets look at the how many UMIs we detect per spot
Hmisc::describe(sparseMatrixStats::colSums2(se_unfiltered@assays$Spatial@counts))
## sparseMatrixStats::colSums2(se_unfiltered@assays$Spatial@counts) 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##     4992        0     4019        1     9161    10653    194.0    323.1    934.0   5081.0  14849.0  24977.9  30529.6 
## 
## lowest :     1    16    23    29    33, highest: 46420 46793 48813 49350 53148
# Lets look at the how many UMI's we detect per gene
Hmisc::describe(sparseMatrixStats::rowSums2(se_unfiltered@assays$Spatial@counts))
## sparseMatrixStats::rowSums2(se_unfiltered@assays$Spatial@counts) 
##        n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75      .90      .95 
##    36945        0     5367    0.889     1238     2189        0        0        0        2      740     2627     4613 
## 
## lowest :       0       1       2       3       4, highest:  467942  493262  530763  667822 2846583
##                                                                                                                           
## Value            0   50000  100000  150000  200000  250000  300000  350000  400000  450000  500000  550000  650000 2850000
## Frequency    36788     119      17       7       3       3       1       1       1       1       1       1       1       1
## Proportion   0.996   0.003   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000
## 
## For the frequency table, variable is rounded to the nearest 50000

0.5.3 Normalization

0.5.3.1 log-normalization vs SCTransform

We want to see which normalization method most effectively reduces the technical bias in the data. We follow the comparison approach detailed in this Vignette.

Log-normalization using Seurat

# also run standard log normalization for comparison
se_filtered <- Seurat::NormalizeData(se_filtered,
                                     verbose = FALSE,
                                     assay = "Spatial")

SCTransform normalization

# rerun normalization to store sctransform residuals for all genes
se_filtered <- Seurat::SCTransform(se_filtered,
                                   assay = "Spatial",
                                   return.only.var.genes = FALSE,
                                   verbose = FALSE)

Computes the correlation of the log normalized data and sctransform residuals with the number of UMIs per gene

se_filtered <- Seurat::GroupCorrelation(
  se_filtered,
  group.assay = "Spatial",
  assay = "Spatial",
  slot = "data",
  do.plot = FALSE)

se_filtered <- Seurat::GroupCorrelation(
  se_filtered,
  group.assay = "Spatial",
  assay = "SCT",
  slot = "scale.data",
  do.plot = FALSE)

Visualize the correlation between grouped by the mean expression

p1 <- Seurat::GroupCorrelationPlot(
  se_filtered,
  assay = "Spatial",
  cor = "nCount_Spatial_cor") +
  ggplot2::ggtitle("Log Normalization") +
  ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5))

p2 <- Seurat::GroupCorrelationPlot(
  se_filtered,
  assay = "SCT",
  cor = "nCount_Spatial_cor") +
  ggplot2::ggtitle("SCTransform Normalization") + 
  ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5))

p1 + p2

From the boxplot above we can see how SCT is slightly better at removing the technical noise, specially in those genes with high UMI counts.

0.5.3.2 Mean-Variance relation

Next we want to see if the log-normalization effectively removes the mean-variance relationship. For this we will use the filtered seurat object which contains only the spots overlapping the tissue.

Mean-Variance relation pre-normalization

p1 <- SCrafty::mean_variance_plot(
  se_obj = se_filtered,
  assay = "Spatial",
  slot = "counts") +
  ggplot2::ggtitle("Raw counts")

p2 <- SCrafty::mean_variance_plot(
  se_obj = se_filtered,
  assay = "Spatial",
  slot = "data") +
  ggplot2::ggtitle("Log Normalization")

p3 <- SCrafty::mean_variance_plot(
  se_obj = se_filtered,
  assay = "SCT",
  slot = "data") +
  ggplot2::ggtitle("SCTransform Normalization")

p1 / p2 / p3

In the above plots we can see how both log normalization and SCTransform equally remove the mean variance relation.

0.6 Session Info

sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] readr_1.4.0        stringr_1.4.0      glue_1.4.2         RColorBrewer_1.1-2 dplyr_1.0.6        cowplot_1.1.1      ggpubr_0.4.0       ggplot2_3.3.3      SeuratObject_4.0.1 Seurat_4.0.2       BiocStyle_2.18.1  
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1            backports_1.2.1         Hmisc_4.5-0             plyr_1.8.6              igraph_1.2.6            lazyeval_0.2.2          splines_4.0.4           listenv_0.8.0           scattermore_0.7         digest_0.6.27           htmltools_0.5.1.1       magick_2.7.2            fansi_0.4.2             checkmate_2.0.0         magrittr_2.0.1          tensor_1.5              cluster_2.1.0           ROCR_1.0-11             openxlsx_4.2.3          globals_0.14.0          matrixStats_0.58.0      spatstat.sparse_2.0-0   jpeg_0.1-8.1            colorspace_2.0-1        ggrepel_0.9.1           haven_2.4.1             xfun_0.23               crayon_1.4.1            jsonlite_1.7.2          spatstat.data_2.1-0     survival_3.2-7          zoo_1.8-9               polyclip_1.10-0         gtable_0.3.0            leiden_0.3.8            car_3.0-10              future.apply_1.7.0      abind_1.4-5             scales_1.1.1            DBI_1.1.1               rstatix_0.7.0           miniUI_0.1.1.1          Rcpp_1.0.6              isoband_0.2.4           htmlTable_2.2.1         viridisLite_0.4.0       xtable_1.8-4            reticulate_1.20         spatstat.core_2.1-2    
##  [50] bit_4.0.4               foreign_0.8-81          Formula_1.2-4           htmlwidgets_1.5.3       httr_1.4.2              ellipsis_0.3.2          ica_1.0-2               farver_2.1.0            pkgconfig_2.0.3         nnet_7.3-15             sass_0.4.0              uwot_0.1.10             deldir_0.2-10           utf8_1.2.1              here_1.0.1              labeling_0.4.2          tidyselect_1.1.1        rlang_0.4.11            reshape2_1.4.4          later_1.2.0             munsell_0.5.0           cellranger_1.1.0        tools_4.0.4             generics_0.1.0          broom_0.7.6             ggridges_0.5.3          evaluate_0.14           fastmap_1.1.0           yaml_2.2.1              goftest_1.2-2           knitr_1.33              bit64_4.0.5             fitdistrplus_1.1-3      zip_2.1.1               purrr_0.3.4             RANN_2.6.1              sparseMatrixStats_1.2.1 pbapply_1.4-3           future_1.21.0           nlme_3.1-152            mime_0.10               SCrafty_0.1.0           rstudioapi_0.13         hdf5r_1.3.3             compiler_4.0.4          plotly_4.9.3            curl_4.3.1              png_0.1-7               ggsignif_0.6.1         
##  [99] spatstat.utils_2.1-0    tibble_3.1.2            bslib_0.2.5.1           stringi_1.6.2           highr_0.9               forcats_0.5.1           lattice_0.20-41         Matrix_1.3-3            vctrs_0.3.8             pillar_1.6.1            lifecycle_1.0.0         BiocManager_1.30.15     spatstat.geom_2.1-0     lmtest_0.9-38           jquerylib_0.1.4         RcppAnnoy_0.0.18        data.table_1.14.0       irlba_2.3.3             httpuv_1.6.1            patchwork_1.1.1         latticeExtra_0.6-29     R6_2.5.0                bookdown_0.22           promises_1.2.0.1        KernSmooth_2.23-18      gridExtra_2.3           rio_0.5.26              parallelly_1.25.0       codetools_0.2-18        MASS_7.3-53             assertthat_0.2.1        rprojroot_2.0.2         withr_2.4.2             sctransform_0.3.2       mgcv_1.8-33             parallel_4.0.4          hms_1.1.0               grid_4.0.4              rpart_4.1-15            tidyr_1.1.3             rmarkdown_2.8           MatrixGenerics_1.2.1    carData_3.0-4           Rtsne_0.15              base64enc_0.1-3         shiny_1.6.0